bootstrap: add cluster_id to the mysql.tidb table (#59511)#65564
Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
@YangKeao This PR has conflicts, I have hold it. |
|
@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Signed-off-by: Yang Keao <yangkeao@chunibyo.icu>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded PD-backed cluster_id persistence to Changes
Sequence Diagram(s)sequenceDiagram
participant Bootstrap as "Bootstrap\n(process)"
participant PD as "PD\n(Cluster ID store)"
participant MySQL as "mysql.tidb\n(system table)"
rect rgba(200,230,201,0.5)
Bootstrap->>PD: request cluster_id
PD-->>Bootstrap: return cluster_id
end
rect rgba(187,222,251,0.5)
Bootstrap->>MySQL: write or update `cluster_id` in mysql.tidb
MySQL-->>Bootstrap: ack
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…511-to-release-8.5 Signed-off-by: Yang Keao <yangkeao@chunibyo.icu>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/session/bootstrap_test.go`:
- Around line 2595-2628: The test currently references the upstream version
constant version241 and is named TestWriteClusterIDToMySQLTiDBWhenUpgradingTo242
which doesn't exist on release-8.5; update the test to use the release-8.5
constant (replace version241 with version223) and rename the test function to
match the release naming (e.g., TestWriteClusterIDToMySQLTiDBWhenUpgradingTo224)
so the test compiles and targets the correct release version; adjust any related
references in the same function (seV241, ver241) to the new names to keep
identifiers consistent (e.g., seV223, ver223 or matching the new test name).
In `@pkg/session/bootstrap.go`:
- Around line 1429-1430: The upgrade dispatch slice is missing upgradeToVer219
through upgradeToVer223, causing those migrations to be skipped; update the
slice that currently lists upgradeToVer218 and upgradeToVer224 to re-insert
upgradeToVer219, upgradeToVer220, upgradeToVer221, upgradeToVer222, and
upgradeToVer223 in order between upgradeToVer218 and upgradeToVer224 so the
migration runner (the function that iterates this slice) will execute those
versions' migration functions (e.g., upgradeToVer219 ... upgradeToVer223) before
reaching upgradeToVer224.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 86a48419-de2c-45bb-815a-9f678c95e007
📒 Files selected for processing (4)
br/pkg/restore/snap_client/systable_restore_test.gopkg/ddl/schematracker/checker.gopkg/session/bootstrap.gopkg/session/bootstrap_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/session/bootstrap.go`:
- Around line 3327-3340: The writeClusterID function currently dereferences the
PD client returned by s.GetDomain().(*domain.Domain).GetPDClient() without a
nil-check, which can panic for stores without PD; modify writeClusterID to
retrieve the PD client into a variable, check if pdClient == nil (or the domain
assertion fails) and return early (or log/handle) when nil, and only call
pdClient.GetClusterID(ctx) when non-nil before passing clusterID into
mustExecute; reference the symbols writeClusterID, sessiontypes.Session,
s.GetDomain(), domain.Domain, GetPDClient(), GetClusterID(), mustExecute, and
tidbClusterID to locate the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6b01ebf1-79f8-4f73-8120-0f315b9802ee
📒 Files selected for processing (2)
pkg/session/bootstrap.gopkg/session/bootstrap_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/session/bootstrap_test.go
| // writeClusterID writes cluster id into mysql.tidb | ||
| func writeClusterID(s sessiontypes.Session) { | ||
| ctx, cancel := context.WithTimeout(context.Background(), time.Duration(internalSQLTimeout)*time.Second) | ||
| defer cancel() | ||
|
|
||
| clusterID := s.GetDomain().(*domain.Domain).GetPDClient().GetClusterID(ctx) | ||
|
|
||
| mustExecute(s, `INSERT HIGH_PRIORITY INTO %n.%n VALUES (%?, %?, "TiDB Cluster ID.") ON DUPLICATE KEY UPDATE VARIABLE_VALUE= %?`, | ||
| mysql.SystemDB, | ||
| mysql.TiDBTable, | ||
| tidbClusterID, | ||
| clusterID, | ||
| clusterID, | ||
| ) |
There was a problem hiding this comment.
Guard the PD client before fetching cluster_id.
(*domain.Domain).GetPDClient() can return nil when the store is not kv.StorageWithPD, so this dereference will panic during bootstrap/upgrade on uni-store/mock-store style deployments.
🛠️ Suggested fix
// writeClusterID writes cluster id into mysql.tidb
func writeClusterID(s sessiontypes.Session) {
+ pdClient := domain.GetDomain(s).GetPDClient()
+ if pdClient == nil {
+ logutil.BgLogger().Warn("skip writing cluster_id: PD client unavailable during bootstrap/upgrade")
+ return
+ }
+
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(internalSQLTimeout)*time.Second)
defer cancel()
- clusterID := s.GetDomain().(*domain.Domain).GetPDClient().GetClusterID(ctx)
+ clusterID := pdClient.GetClusterID(ctx)
mustExecute(s, `INSERT HIGH_PRIORITY INTO %n.%n VALUES (%?, %?, "TiDB Cluster ID.") ON DUPLICATE KEY UPDATE VARIABLE_VALUE= %?`,
mysql.SystemDB,
mysql.TiDBTable,As per coding guidelines, "Keep error handling actionable and contextual; avoid silently swallowing errors in Go code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/session/bootstrap.go` around lines 3327 - 3340, The writeClusterID
function currently dereferences the PD client returned by
s.GetDomain().(*domain.Domain).GetPDClient() without a nil-check, which can
panic for stores without PD; modify writeClusterID to retrieve the PD client
into a variable, check if pdClient == nil (or the domain assertion fails) and
return early (or log/handle) when nil, and only call pdClient.GetClusterID(ctx)
when non-nil before passing clusterID into mustExecute; reference the symbols
writeClusterID, sessiontypes.Session, s.GetDomain(), domain.Domain,
GetPDClient(), GetClusterID(), mustExecute, and tidbClusterID to locate the
change.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release-8.5 #65564 +/- ##
================================================
Coverage ? 55.9887%
================================================
Files ? 1821
Lines ? 655880
Branches ? 0
================================================
Hits ? 367219
Misses ? 261862
Partials ? 26799
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/session/bootstrap_test.go (1)
2631-2633: UseunsetStoreBootstrappedhelper for consistency.This test directly manipulates
storeBootstrappedLockandstoreBootstrappedmap, while all other tests in this file use theunsetStoreBootstrapped(store.UUID())helper function for the same purpose.♻️ Suggested fix for consistency
// upgrade to current version dom.Close() - storeBootstrappedLock.Lock() - delete(storeBootstrapped, store.UUID()) - storeBootstrappedLock.Unlock() + unsetStoreBootstrapped(store.UUID()) domCurVer, err := BootstrapSession(store)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/session/bootstrap_test.go` around lines 2631 - 2633, The test is directly manipulating storeBootstrappedLock and the storeBootstrapped map; replace those three lines with a call to the existing helper unsetStoreBootstrapped using the store's UUID: call unsetStoreBootstrapped(store.UUID()) instead of Lock/Delete/Unlock to match other tests and keep behavior consistent with the helper implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/session/bootstrap_test.go`:
- Around line 2631-2633: The test is directly manipulating storeBootstrappedLock
and the storeBootstrapped map; replace those three lines with a call to the
existing helper unsetStoreBootstrapped using the store's UUID: call
unsetStoreBootstrapped(store.UUID()) instead of Lock/Delete/Unlock to match
other tests and keep behavior consistent with the helper implementation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 17a12d5c-8c64-4aaa-a6db-b7804a123730
📒 Files selected for processing (1)
pkg/session/bootstrap_test.go
[LGTM Timeline notifier]Timeline:
|
|
/retest |
|
/retest |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/session/bootstrap.go (1)
1450-1452:⚠️ Potential issue | 🔴 CriticalAdd
upgradeToVer226to the upgrade dispatch list.
upgradeToVer226is defined below, but this slice still stops atupgradeToVer225. That means an existing cluster can be marked as bootstrap version 226 without ever writing the newcluster_idrow.🛠️ Suggested fix
upgradeToVer223, upgradeToVer224, upgradeToVer225, + upgradeToVer226, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/session/bootstrap.go` around lines 1450 - 1452, The upgrade dispatch list currently ends at upgradeToVer225 but upgradeToVer226 is implemented separately; update the upgrades slice (the list that includes upgradeToVer224, upgradeToVer225, etc.) to append upgradeToVer226 so the bootstrap path executes the new upgrade logic (ensuring the new cluster_id row gets written) by adding upgradeToVer226 to that slice.
♻️ Duplicate comments (1)
pkg/session/bootstrap.go (1)
3350-3364:⚠️ Potential issue | 🔴 CriticalGuard the PD client before fetching
cluster_id.This still dereferences the PD client unconditionally.
pkg/domain/domain.go:1853-1860returnsnilwhen the store is notkv.StorageWithPD, so bootstrap/upgrade will panic on uni-store or mock-store style deployments.🛠️ Suggested fix
// writeClusterID writes cluster id into mysql.tidb func writeClusterID(s sessiontypes.Session) { + pdClient := domain.GetDomain(s).GetPDClient() + if pdClient == nil { + logutil.BgLogger().Warn("skip writing cluster_id during bootstrap/upgrade: PD client unavailable") + return + } + ctx, cancel := context.WithTimeout(context.Background(), time.Duration(internalSQLTimeout)*time.Second) defer cancel() - clusterID := s.GetDomain().(*domain.Domain).GetPDClient().GetClusterID(ctx) + clusterID := pdClient.GetClusterID(ctx) mustExecute(s, `INSERT HIGH_PRIORITY INTO %n.%n VALUES (%?, %?, "TiDB Cluster ID.") ON DUPLICATE KEY UPDATE VARIABLE_VALUE= %?`, mysql.SystemDB,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/session/bootstrap.go` around lines 3350 - 3364, In writeClusterID, avoid unconditionally dereferencing the PD client returned from s.GetDomain().(*domain.Domain).GetPDClient(); first obtain the domain and pdClient (use the same symbols: writeClusterID, s.GetDomain(), domain.Domain, GetPDClient, GetClusterID, tidbClusterID, mustExecute), check if pdClient is nil (or assert StorageWithPD where appropriate) and only call GetClusterID when pdClient != nil; if nil, skip the INSERT or insert a safe default/omit the variable, and keep the existing context/cancel logic intact so bootstrap/upgrade won’t panic on uni-store or mock-store deployments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/session/bootstrap.go`:
- Around line 1450-1452: The upgrade dispatch list currently ends at
upgradeToVer225 but upgradeToVer226 is implemented separately; update the
upgrades slice (the list that includes upgradeToVer224, upgradeToVer225, etc.)
to append upgradeToVer226 so the bootstrap path executes the new upgrade logic
(ensuring the new cluster_id row gets written) by adding upgradeToVer226 to that
slice.
---
Duplicate comments:
In `@pkg/session/bootstrap.go`:
- Around line 3350-3364: In writeClusterID, avoid unconditionally dereferencing
the PD client returned from s.GetDomain().(*domain.Domain).GetPDClient(); first
obtain the domain and pdClient (use the same symbols: writeClusterID,
s.GetDomain(), domain.Domain, GetPDClient, GetClusterID, tidbClusterID,
mustExecute), check if pdClient is nil (or assert StorageWithPD where
appropriate) and only call GetClusterID when pdClient != nil; if nil, skip the
INSERT or insert a safe default/omit the variable, and keep the existing
context/cancel logic intact so bootstrap/upgrade won’t panic on uni-store or
mock-store deployments.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c43b1f7b-f1de-4a36-83d3-c0702e54e862
📒 Files selected for processing (2)
br/pkg/restore/snap_client/systable_restore_test.gopkg/session/bootstrap.go
🚧 Files skipped from review as they are similar to previous changes (1)
- br/pkg/restore/snap_client/systable_restore_test.go
…511-to-release-8.5 Signed-off-by: Yang Keao <yangkeao@chunibyo.icu>
f26c5e1 to
f9e9b4b
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Defined2014, lance6716, Leavrth, yudongusa The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@yudongusa Yes! Thanks for reminding 🍻 . They have been prepared pingcap/docs#22566 and pingcap/docs-cn#21439. |
|
/unhold |
|
/retest |
4 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/retest |
This is an automated cherry-pick of #59511
What problem does this PR solve?
Issue Number: close #59476
Problem Summary:
We need a mechanism to detect whether two TiDB servers are in the same tidb cluster. Adding the
cluster_idto themysql.tidbtable is the most sutiable way. I've also considered other options in #59476 .What changed and how does it work?
Add the
cluster_idrow to themysql.tidbtable.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit